official implementation
Appendix A Algorithm details
A.1 GLASS Algorithm 1 GAN-based latent space search attack ( GLASS) Require: A standard ResNet-18 network is divided into blocks, as shown in Figure 8. From Similarly, for GLASS, we set the learning rate to 1e-2 and the number of iterations to 20,000. Regarding IN, we selected a learning rate of 1e-3 and performed 30 training epochs. The accuracy of each defended model and its corresponding defense hyperparameters are shown in Table 3. Table 3: Details of defense hyperparameters (we set the split point uniformly to Block3). We train 50 distributions for Shredder, maintaining an accuracy of over 77% for all of them. As Figure 10 shows, the upper left curve implies a better privacy-utility trade-off. NoPeek and DISCO achieve the optimal defensive effect on almost all DRAs.
- Pacific Ocean > North Pacific Ocean > San Francisco Bay (0.04)
- Oceania > New Zealand (0.04)
- Oceania > Australia (0.04)
- (10 more...)
- Research Report > Experimental Study (0.93)
- Research Report > New Finding (0.69)
Supplementary Material RE
D.3 Open source performance on mini test set . . . . . . . . . . . . . . . . . . . . . A.1 V ersion 2 We have fixed some bugs in the evaluation code, resulting in slight differences compared to the previous release. The issue was that 149 samples were not evaluated in the previous version, and these have now been included in the new update. A.2 V ersion 3 We have clarified certain statements and added experimental results to address the reviewer's questions. B.1 Limitations Despite these advancements, our dataset does exhibit certain limitations, largely stemming from inherited biases from the source datasets: Currently, we only address scenarios where both the question and the answer span a single time duration. Given a question, the annotated time span must be a single, continuous duration, which might be limiting for all scenes. The presence of noisy or inaccurate annotations in the source datasets, including captions and timestamps, poses a challenge. Despite our efforts, some of these errors could not be automatically filtered out. The extent of this issue is detailed in the qualitative visualization conducted by our human reviewers, as presented in supplementary. The average duration of ground truth events in our dataset is relatively long. This characteristic has the unintended consequence of hindering the models' ability to detect and analyze fine-grained actions within shorter video segments. These drawbacks highlight areas for potential improvement and indicate the necessity for ongoing refinement to ensure the creation of more accurate and unbiased video language models. B.2 Social Impact Though we provide an assessment of temporal reasoning and moment localization, the types and scene diversity are still limited. We inherit the video classes from the two source video datasets, which may not be sufficient for a comprehensive assessment of all kinds of temporal reasoning. This limitation could introduce a bias. For both curated data and video data, they do not contain any personally identifiable information. Besides, some of the video samples in the source datasets might be slightly uncomfortable depending on the viewer. For example, some videos discuss tattoos and piercings, and some of them present news about social events including demonstrations or war reports. However, we only release the data of curated question-answer and time span.
- North America > United States (0.04)
- Asia > Taiwan (0.04)
- Law (1.00)
- Information Technology (0.93)
SupplementaryMaterialsfor" PrivateSetGeneration withDiscriminativeInformation "
To compute the privacy cost of our approach, we numerically computeDα(M(D) M(D)) in Definition A.1 for a range of ordersα [9, 14] in each training step that requires access to the real gradientgDθ . In comparison to normal non-private training, the major part of the additional memory and computation costisintroduced bytheDP-SGD [1]step(fortheper-sample gradient computation) that sanitizes the parameter gradient on real data, while the other steps (including the update onS, and theupdates ofF(;θ)onS areequivalent tomultiple calls ofthenormal non-privateforward and backward passes (whose costs havelower magnitude than theDP-SGD step). GS-WGAN [3] 5 We adopt the default configuration provided by the official implementation (ε=10): thesubsamplingrate =1/1000,DPnoisescaleσ =1.07,batchsize=32. Following[3], we pretrain (warm-start) the model for2K iterations, and subsequently train for 20K iterations. The experiments presented in Section 5.2 of the main paper correspond to the classincremental learning setting [10]where thedata partition ateach stage contains data from disjoint subsets of label classes.
- Pacific Ocean > North Pacific Ocean > San Francisco Bay (0.04)
- Oceania > New Zealand (0.04)
- Oceania > Australia (0.04)
- (10 more...)
- Research Report > Experimental Study (0.93)
- Research Report > New Finding (0.69)
- Law (1.00)
- Information Technology (0.93)
Appendix A Algorithm details
A.1 GLASS Algorithm 1 GAN-based latent space search attack ( GLASS) Require: A standard ResNet-18 network is divided into blocks, as shown in Figure 8. From Similarly, for GLASS, we set the learning rate to 1e-2 and the number of iterations to 20,000. Regarding IN, we selected a learning rate of 1e-3 and performed 30 training epochs. The accuracy of each defended model and its corresponding defense hyperparameters are shown in Table 3. Table 3: Details of defense hyperparameters (we set the split point uniformly to Block3). We train 50 distributions for Shredder, maintaining an accuracy of over 77% for all of them. As Figure 10 shows, the upper left curve implies a better privacy-utility trade-off. NoPeek and DISCO achieve the optimal defensive effect on almost all DRAs.
A Evaluation Metrics A.1 Expected Calibration Error (ECE) Expected calibration error (ECE) [
The difference between acc and conf can be intuitively seemed as the deviation of the outputs to the diagonal in Figure 1. The higher the accuracy of predictions is, the lower BS is. The details of these datasets are summarized in Table 5. Our data are public and do not contain personally identifiable information and offensive content. The learning rate is 0.01 and the maximum number of iteration is 50.